Machine Learning with Data Dependent Hypothesis Classes
نویسندگان
چکیده
We extend the VC theory of statistical learning to data dependent spaces of classifiers. This theory can be viewed as a decomposition of classifier design into two components; the first component is a restriction to a data dependent hypothesis class and the second is empirical risk minimization within that class. We define a measure of complexity for data dependent hypothesis classes and provide data dependent versions of bounds on error deviance and estimation error. We also provide a structural risk minimization procedure over data dependent hierarchies and prove consistency. We use this theory to provide a framework for studying the trade-offs between performance and computational complexity in classifier design. As a consequence we obtain a new family of classifiers with dimension independent performance bounds and efficient learning procedures.
منابع مشابه
A fuzzy mixed-integer goal programming model for a parallel machine scheduling problem with sequence-dependent setup times and release dates
This paper presents a new mixed-integer goal programming (MIGP) model for a parallel machine scheduling problem with sequence-dependent setup times and release dates. Two objectives are considered in the model to minimize the total weighted flow time and the total weighted tardiness simultaneously. Due to the com-plexity of the above model and uncertainty involved in real-world scheduling probl...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملComparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کاملPerceptron and SVM learning with generalized cost models
Learning algorithms from the fields of artificial neural networks and machine learning, typically, do not take any costs into account or allow only costs depending on the classes of the examples that are used for learning. As an extension of class dependent costs, we consider costs that are example, i.e. feature and class dependent. We derive a costsensitive perceptron learning rule for non-sep...
متن کاملSelective Rademacher Penalization and Reduced Error Pruning of Decision Trees
Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It appears to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 2 شماره
صفحات -
تاریخ انتشار 2002